A NEW MODEL FOR
MANAGING AND DISTRIBUTING
CONTENT ON THE WORLD
WIDE WEB The Versant Web Propagation
Framework
Abstract
The Versant Web Propagation Framework offers an
entirely new model for managing high volume Web content publishing and
distribution. It has been adopted by Genuity, a Bechtel company, as the
foundation technology for managing Web content across one of the highest
bandwidth, redundant Internet backbones in the world. On top of the
Framework, Genuity has built a product called The Reflector which
manages access to replicated content throughout the network. Together,
they provide a means for building the large scale, distributed, high
performance, high availability Web sites that are essential for the
realization of the industry's vision of electronic commerce.
Introduction
Not since atomic power was going to be
"too cheap to meter" has any technology held such explosive
potential as the World Wide Web. But the visions that Web boosters hold
for global electronic commerce will remain only that, visions, unless
some fundamental problems are addressed. Chief among these are issues of
performance, reliability, availability, integrity, and scalability.
Unless the medium matures to the point that these are taken for granted
- as they are, for example, in today's telephony networks - the long-
awaited and eagerly lusted-after throngs of buyers and sellers will
never materialize.
This paper discusses the problems involved in building such a
robust, scalable infrastructure for Web-based electronic commerce. It
goes on to describe a new model for transmission management, content
distribution and transaction management on the Web. This model is based
on a distributed object database software architecture and a high
bandwidth, redundant transmission backbone that is being built today.
This framework radically changes the way information can be propagated
and accessed across the World Wide Web. The result can be
orders-of-magnitude improvements in performance, reliability,
availability, integrity and scalability for high end Web-based
publishing and transactioning applications. Versant believes this
approach to content distribution and transaction management will
dramatically accelerate the realization of a truly global electronic
marketplace.
Electronic Commerce - The Vision
The vision of a vast
electronic marketplace built on the World Wide Web is well understood:
millions, one day perhaps billions, of users "enter" the
marketplace through browser-enabled computers to buy goods and services
from millions of vendors who have "set up shop" through their
Web sites around the world. The timing, the character and the size of
this marketplace differ according to the imagination, ambition, and
self-interest of the visionary but most informed sources expect it to be
quite large, ranging up to the hundreds of billions of dollars per year
by the year 2000.
No matter how large it is or how soon it arrives, the character of
this marketplace is changing dramatically. The original "Electronic
Bulletin Board" Web with its static page publishing paradigm is
already giving way to a "Shopping Mall" Web characterized by
dynamic publishing with single, point-to-point transactioning. At the
very leading edge, players are building the foundations of a
"Commodities Exchange" Web where millions of participants will
broker, auction and arbitrage for value in everything from wholesale
electric power to financial instruments and airline tickets. Like the
telephony system where a variety of services are resident in the network
and available on demand (voice mail, call forwarding, virtual private
network, etc.), this Commodities Exchange Web will be rich with
transactioning support services, everything from intelligent roving
agents and software rental for specialized transactions to credit
authorization, escrow services, electronic bonding and collection
services. One scenario of this evolution is depicted in Exhibit 1.
Exhibit 1: The evolution of Internet computing - increasing
complexity and value.
Electronic Commerce - The Threat But the emergence of
such a world is by no means assured. Congestion and overload are already
crippling the Web's performance and limiting its attractiveness. These
problems have led such luminaries as Ethernet inventor Bob Metcalfe to
predict an imminent collapse of the Web, crushed under the burden of its
own success and the incremental, linear thinking of its builders. Such a
collapse is presaged by the cumulative impact of five converging forces:
1) The number of users continues to grow
at rates well above 100% per year and will continue to do so for several
years;
2) Usage per user is climbing steadily as the Web
continues to add more and more value and interest to existing users;
3) The density of content per user interaction is
expanding rapidly as multimedia displaces messaging as the primary
content of Web interaction;
4) The complexity of each user interaction is
exploding, especially as the Web evolves from a static, single-site,
page publishing medium to a transactional environment involving multiple
parties, locations, programs and data sources. And finally;
5) The network services to support such usage are
themselves exploding - everything from digital signatures and e-cash to
component subscription and applet metering services - all contending for
limited bandwidth, processor cycles and storage capacity.
These five factors compound one another to place exponential burdens
on the transmission and processing substrates over which all e-commerce
must pass. Napoleon's dictum, that, "Quantity has a quality all its
own," was never so meaningful or portentious as it is here. Unless
this problem is addressed, messianic visions for global e-commerce will
soon morph into dystopian nightmares of "Internet Brownout,"
an online hell of data tones forever chirping busy signals, paralyzing
waits, stultifyingly stale content, and vast electronic wastelands of
broken links, dead pages, and abandoned sites. Kind of a Blade Runner
future for burned out cyber-dilettantes.
Inadequacy of Existing Solutions Existing solutions offer
limited hope, for they are almost all premised on the assumption that
the continued, linear application of conventional resources can address
the problem. But look at the real problem as illustrated in Exhibit 2.
The most successful sites on the Internet today are those that bring
millions of customers to a single location every day. By dint of their
popularity, they are able to sell more advertising thereby adding more
features thereby attracting more users thereby.... in an
upwardly-spiraling, seemingly virtuous circle of popularity, profit and
growth.
Exhibit 2: Popular web sites attract millions of users a day but
suffer from network congestion, processor overload and single point of
failure. But
these and similarly successful sites are already the victims of three
debilitating, inter- related and ultimately destabilizing forces:
congestion getting into and out of the site; delay once inside; and
vulnerability to single point of failure. Congestion getting into the
site is traditionally treated by increasing the size of the pipe or the
network connection leading to the site. Of course, unless this is
simultaneously accompanied by an increase in the number of servers
within the site, it overwhelms the site╣s processing capacity and
results in Web server performance delays that, from
the user's perspective, are indistinguishable from network congestion delays.
And even if the number of servers are increased, the third problem, risk
from single point of failure, far from being ameliorated, is actually
aggravated.
Ever resourceful under the looming threat of diminished traffic and
reduced profits, Web site owners have adopted the workaround of site
mirroring to address these problems. Under this scheme, the content of
one site is duplicated in one or many other sites. At first blush, site
mirroring appears an ideal solution to the above cited problems of
network bottlenecks, performance congestion and single point of failure.
But site mirroring spawns a gaggle of its own problems: the user
must know that he needs to connect to a mirror site, not just the
primary site; he must know how to connect; and unless the congestion
problem is simply to be rolled over to the larger Internet
"commons", he must know which site is closest, either through
the closest physical network route (the one involving the least number
of hops) or through the network route providing the greatest available
bandwidth - whichever is most appropriate, depending on his application
needs. This requires a presumption of omniscience on the part of the
user that probably outstrips the insight, ingenuity and intrepidness of
all but the most committed of cybernauts.
Undaunted, enterprising Web site managers have devised still more
inventive workarounds. The most notable of these is the Round Robin
Domain Name Service (RR-DNS) under which users are allowed to believe
that they are connecting to a single site when in fact, the RR-DNS is
connecting them to the mirrored sites - kind of a Wizard of Oz for
Webdom - pay no attention to that little man behind the curtain.
But like simple site mirroring, RR-DNS suffers its own unique
problems:
- RR-DNS may direct the user to a site which is already busy;
- It may direct the user to a site which is further away in
network terms;
- The re-directed site may be out of date with respect to other
sites (RR-DNS does nothing to address synchrony of content in multiple
sites - a profound problem for transactionally-intensive applications);
- The site which hosts the RR-DNS process may itself be saturated;
- The RR-DNS site may be unreachable due to a network failure
which effectively takes all the duplicate sites off the air.
Rethinking the Requirements One begins to understand the
wisdom of the Greek myth of the Minotaur which grew two new heads for
each one Theseus could chop off. How to slay such a beast? Let's
reconsider for a moment what it is we're trying to accomplish. As we
said above, for e- commerce to become more than just a high-tech vision,
it has to look a lot more like the phone system in terms of the
following:
- Performance: Can users be assured of short
and predictable waits? Are they consistent from one interaction to
another and common for all players? If not, confidence in the integrity
of time-critical transactions - the sine qua non of electronic commerce
- will never emerge.
- Reliability: The near-religious aphorism that
exemplifies standards for the world's telephony industry is,
"You've got to have a dial tone." Unless users and vendors
have equal confidence in the reliability of datatone provisioning, they
will never commit serious resources to the medium.
- Availability: This is Reliability crossed with
Performance. The site might be up but if you can't get in - or out - it
is meaningless. Conversely, it might be fast but if it's unpredictable
it's too risky to bet your business on. It's got to be there when
you need it and responsive in the way you need it.
- Integrity: Everybody using common data must have
unquestioned assurance that it is the same everywhere, the more so as
the Web expands, as data propagates widely, and as transactions become
an ever-larger component. Any doubts about disparities in the currency
or availability of data will cause the whole of the system to be suspect
and therefore avoided.
- Scalability: Exponential growth means that the
overall infrastructure for Web commerce must scale hundreds - probably
thousands - of times without interruption or degradation of services.
Does anyone seriously believe that the linear approach being followed
today will be able to do this?
It was only after the phone system delivered this level of service
that it became the foundation for a global business system. Current
approaches to building a similarly robust, scalable, high performance
Web backbone fail this test because, fundamentally:
- they lack intelligence about system-wide loading;
- they do not maintain synchrony of data between duplicate sites;
- they do not balance network congestion considerations with
information about processor utilization;
- they presume an intelligence on the part of the user that is
unrealistic;
- they approach an exponential problem with a linear solution;
and,
- they maintain a single-point-of-failure architecture.
If the industry's hopes for a truly large scale, distributed,
content-rich, transactionally- intensive electronic marketplace are to
be realized, we must re-think the way the underlying transmission,
content publishing and transactioning infrastructures are built.
The Versant Web Propagation Framework
The Versant Web
Propagation Framework offers a new model for managing high volume Web
content publishing and distribution systems. It is built on the Versant
Object Database Management System, the most widely deployed ODBMS in the
world. (See the section below, "Why an Object Database?") The
Versant Web Propagation Framework has been adopted by Genuity, a Bechtel
company, as the foundation technology for managing Web content
distribution across one of the highest bandwidth, redundant Internet
backbones in the world today. On top of the Framework, Genuity has built
a product called The Reflector which manages access to replicated
content throughout the network. Together, they provide a means for
building the large scale, distributed, high performance, high integrity,
robust Web sites that are essential for the realization of the vision of
electronic commerce.
At the highest level, the Web Propagation Framework enables Web
sites to replicate any amount of Web content among multiple locations
throughout the world. The content from each site is made available to
users on a synchronous basis - all users see new or changed content at
exactly the same time, no matter from which site it originated or which
site they access. The Genuity Reflector directs user queries to the Web
site that is closest to the user in network terms or which has the
greatest available capacity at a given time. The result is a massively
distributed, single system image of Web-resident data that delivers
reliability, scalability and performance never before possible. A
conceptual representation of the system is presented in Exhibit 3.
Exhibit 3: Sites propagate content to one another and all display
it to users at the same time. The Reflectors send user requests to the
site which is closest in network terms or which has the most available
capacity. The
Framework utilizes the underlying distributed object database technology
of the Versant ODBMS. Content is replicated as a policy matter which can
be defined by Web site administrators and can range from single HTML
pages or Java applets up to the data for individual transactions, entire
volumes or even entire Web sites. The timing and events
"triggering" the replication process are also policy matters
and can be initiated automatically or driven manually, again, in the
complete control of Web site administrators.
The propagation process works like this: Each site contains an
arbitrary number of Web Views, virtually partitioned files and
directories defined in the database depending on the change profile of
the Web site content (for example, change often for some, change rarely
for others) and the business rules of the site's owner. Working changes
to any content in the site are made through the Web Views and are
recorded to a "change object" in the database before they are
displayed on the local site. This change object maintains information
about operational directories and their contents and is common to all of
the participants or "peers" in the system. Peer sites can be
added at any time with complete transparency to the operation of
existing sites.
When changes are initiated, their propagation is negotiated via a
lightweight distributed two phase commit protocol which commits first
the change list to common sites, then the content itself. The messages
between sites regarding changes are persistent, ensuring that all sites
maintain common state and content integrity is assured throughout. The
Framework utilizes an n-way protocol whereby any site can act as a
"master" in the propagation process and any site can act as a
"slave." It provides an automatic, services- level mechanism
for detecting and resolving conflicts regarding propagation timing among
multiple sites. Knowledge and state of each of the participating sites
is maintained at each site's database.
On top of the Web Replication Framework,
Genuity has built The Reflector which itself can be replicated
throughout the system and which is updated as to the content,
congestion, and availability of all of the sites in the network.
Requests for access to specific Web content can be forwarded to the site
which is closest to the user in network terms or which has the highest
available capacity. In the event of any sites failing, the Reflector
automatically routes subsequent requests to sites which are alive,
returning to the down sites once they are brought back on line. Finally,
the entire mechanism and process is completely transparent to the user.
While the "manifestation" of content on peer sites can be
timed according to the dictates of the application being accessed (a
policy option of the Framework itself), the real value emerges when
system wide synchrony is practiced. See, for example, Exhibit 4 which
displays different classes of applications and the implications for
integrity and transactional volume.
Exhibit 4: Different classes of applications require different
levels of integrity and synchrony, partly depending on transaction
volume. Policy decisions on site replication must reflect these
constraints. As users enter sites under the
management of the Web Propagation Framework, their hyperlinking
activities may take them into and out of as many as a dozen or more
different Web sites. Say the user accesses a site for Directory
services, then links to a corporate Web site to gather information. From
there he goes to a site for time delayed stock quotes and finally moves
to a near-realtime stock brokerage site.
The Reflector knows the user's network location as well as the
location and status of each of the sites under management of the Web
Propagation Framework. In managing each of these hyperlinked connections
it is therefore able to direct them to the Web site that is closest to
the user in network terms (involves the fewest number of hops) or which
has the greatest available capacity. In any event, no matter which of
many available sites the user is directed to, he sees exactly the same
content as is displayed on every other site.
Exhibit 5: The Web Propagation Framework makes the same
information available to multiple sites at the same time, greatly
improving system reliability. Users access data locally, seeing dramatic
improvements in perceived web site performance. The consequences of this
architecture are quite dramatic relative to existing alternatives, and
are represented in Exhibit 5. Addressing the Requirements noted above,
the architecture delivers the following:
- Performance. Performance from the user's
point of view is improved dramatically. This is due to three factors: 1)
intelligent routing of user requests to the most appropriate sites; 2)
reduced network contention at the entry point to popular locations; and
3) distribution of processing burdens across multiple locations and
machines to those with the greatest available capacity.
- Availability. Availability is substantially
improved over single site alternatives, site mirroring and RR-DNS
options as multiple sites with redundant access points and intelligent
routing provide unlimited failsafe capabilities.
- Improved vendor economics. Significant cost
savings are realized by the application of many smaller machines in
distributed locations versus the "Battlestar Gallactica"
phenomenon that overtakes popular sites and leads to quickly diminishing
performance per unit of money applied.
- Unlimited scalability. There are no practical
limits to the number of sites that can be managed through this approach.
For vendors earning advertising revenues based on user hits, this
architecture provides an almost infinitely elastic path to expansion
without interrupting existing operations.
- Complete transparency. Users never know that their
requests are being directed to different sites at different times based
on network and capacity considerations. All they see are dramatic
improvements in speed and exactly the same information no matter where
they are connected to.
Taken together, these can provide the most robust, highest
performance, scalable Web site hosting, content distribution and
distributed Web-based transaction management system available in the
world today.
Why an Object Database?
But why an object database?
Couldn't this be built with Relational? The answer is no. The Web simply
doesn't exist without objects. It is saturated with them. The C++
programs, browser plug-ins, CORBA and DCOM/OLE messaging, Java applets,
ActiveX components, and the content itself - the graphics, the audio,
video, schematics - are all objects. The rules of engagement for
transactions in cyberspace are being developed in object-oriented
languages and exist in programmatic form as objects: What are the terms
of the arbitrage? How is the escrow to be settled? What are the
demographics of the current user? How can I build a portfolio management
application from rented components and use it to run a Capital Asset
Pricing Model analysis on my 401K accounts that are spread across eleven
funds in six different management firms, then use the results to
re-allocate my portfolio? The dynamism, distribution, flexibility and
heterogeneity that are the very hallmark of the Web are only possible through
the use of object technology.
But does this necessarily imply the need for object storage and
management? Within three to five years there will be literally trillions of
such objects floating around in cyberspace. All will need storage,
managing, versioning, transacting, and metering - let's not forget we're
going to have to make money somewhere! But trying to do this with
relational technology that was invented in the early 1970's is like
trying to cut up tennis balls to make them fit into envelopes - the two
dimensional tables that are the paradigm for relational storage. It can
be done but it's hard, messy, time consuming and error prone. Relational
vendors call it "mapping". I not only exacts a huge overhead
for every line of code written, it produces programs that are
horrifically intractable - not what you need when Web life cycles are
measured in dog years and your apps need to be updated every day.
Worse, what do you do when you want your tennis balls back? The
technique is called "joining". Bring all the possible data to
a central location, look up indexes in each table that point to foreign
keys in other tables, join the tables together, cull through all the
data to see what you need and what you don't need from the common set,
and...are you still there? Your user isn't. He left hours ago in
disgust. For complex transactions involving large sets of data, ODB
users have reported publicly they're getting 1000X improvements in
performance relative to RDBs. That 1000 times faster, not percent. With
the way the Web is growing in terms of data volumes, users, complexity
of transactions and content, you simply cannot cut up, stuff into
envelopes, ship around and re-glue tennis balls fast enough to scale and
still meet performance objectives.
Why Versant?
So why Versant? Versant was not only the
first ODB vendor to go public, it is also the most widely deployed ODBMS
in the world. Versant is the engine driving airline reservation systems,
regional electric power grid management systems, global logistics
management systems, advanced corporate billing systems, commodities
trading systems, and other impossibly hard, mission-critical systems
that simply couldn't get built with any other available technology.
Versant has been cited as a benchmark standard for the world's telephony
industry for building next generation network management, service
activation and service deployment applications. These are the hardest
applications on earth.
Not surprisingly, it is the same architecture that made Versant
attractive to the world's telephony industry that makes it such an ideal
fit for Web publishing, transaction and transmission management. The
requirements for both environments in terms of distribution,
scalability, performance, reliability and flexibility are almost
identical, although the standards in telephony are still much higher
than they are for the Web - remember, you╣ve got to have a dial tone.
Versant was designed in the late 1980's by
some of the most prestigious RDB architects in the world, people who
knew everything about high volume transactioning databases but also knew
the world was no longer flat, simple, centralized or static. Its object
granularity enables it to scale across 65,000 distributed databases,
managing 281 trillion objects in each database. These are the
credentials and this is the technology that allow us to demonstrate a
new model for distributing and managing Web content in a manner that
greatly improves the chances that the visions the industry holds for the
Web can now be realized.
Conclusion The promise for the Web as a medium for
conducting global electronic commerce is simply breathtaking. But if
those ambitions are to be realized, existing models for how to manage
Web content, transactions, and transmission will have to change. Current
solutions will simply not scale well enough and are already provoking
predictions of collapse.
Versant has developed the Versant Web Propagation Framework as a
foundation for building massively distributed, high performance, highly
scalable Web site systems. It has been embraced by Genuity, a Bechtel
company, which is using it to manage synchronous distribution of Web
content across one of the highest bandwidth Internet backbones in the
world. Versant welcomes the opportunity to work with industry
participants to apply its technology to other similarly ambitious
projects in pursuit of a truly global electronic marketplace.
|